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Abstract 

Many practical problems are related to the pointwise estimation of dis- 
tribution functions when data contains measurement errors. Motivation 
for these problems comes from diverse fields such as astronomy, reliability, 
quality control, public health and survey data. 

Recently, Dattner, Goldenshluger and Juditsky (2011) showed that an 
estimator based on a direct inversion formula for distribution functions 
has nice properties when the tail of the characteristic function of the mea- 
surement error distribution decays polynomially. In this paper we derive 
theoretical properties for this estimator for the case where the error distri- 
bution is smoother and study its finite sample behavior for different error 
distributions. Our method is data-driven in the sense that we use only 
known information, namely, the error distribution and the data. Applica- 
tion of the estimator to estimating hypertension prevalence based on real 
data is also examined. 

Keywords: adaptive estimator, deconvolution, error in variables, preva- 
lence. 



1 Introduction 

This research is motivated by the problem of pointwise estimation of distri- 
bution functions in the presence of measurement errors (distribution decon- 
volution). Interest in this problem goes back to Eddington (1913) who was 
motivated by astronomical data. GafFey (1959) studied the problem of cor- 
recting for normal measurement errors in determining human cholesterol levels 
while Scheinok (1964), motivated by reliability theory, studied this problem un- 
der the assumption that the errors are exponentially distributed. In a quality 
control context, Mee (1984) studied the problem of estimating the proportion 
of a product satisfying a lower specification limit when the available data are 
subject to measurement error. Different approaches for estimating the finite 
population cumulative distribution function were developed for survey data, see 



1 



Stcfanski and Bay (1996) and references therein. Nusser, Carriquiry, Dodd, and 
Fuller (1996) developed a semiparametric transformation approach to estimat- 
ing usual daily intake distributions while Cordy and Thomas (1997), motivated 
by similar problems, suggested modeling the unknown distribution as a mixture 
of a finite number of known distributions. Also in the context of survey data, 
Eltinge (1999) develop adjusted estimators of distribution functions or quantiles 
for cases in which measurement errors are nonnormal. 

The methods developed in the papers cited above include both parametric 
and nonparametric approaches. Considering a nonparametric framework, the 
natural thing to do may be first to estimate the density and then integrating to 
obtain the estimator for the distribution function. This type of estimator was 
considered in Zhang (1990) and proved to be minimax optimal in Fan (1991) 
for the case of supersmooth error distributions. However, Fan (1991) was not 
able to show that this estimation method is optimal when the errors are or- 
dinary smooth (e.g., double-exponential errors). We note that in the case of 
direct observations Zhou and Harezlak (1996) observed that optimality in den- 
sity estimation does not carry over to distribution estimation. Recently, the case 
of ordinary smooth measurement errors was shown in Dattner, Goldenshluger 
and Juditsky (2011) to be a more delicate one. In their work, a different esti- 
mation method was considered, namely, estimation based on a direct inversion 
formula for distribution functions. This deconvolution estimator was proved to 
be minimax optimal with no tail conditions being assumed for the estimated 
distribution (as has been required in all previous work). Also, based on Lepski's 
adaptation procedure (Lepski (1990)) they developed an adaptive algorithm for 
implementing the deconvolution estimator. 

In this paper we study further the problem of distribution deconvolution and 
consider both theoretical and practical aspects of the problem. The theoretical 
results are for the case of a known error distribution as is generally discussed in 
the deconvolution literature. In particular, the contribution of this research is 
as follows. 

1. We show that a deconvolution estimator based on the direct inversion for- 
mula is minimax optimal also for supersmooth errors with no tail condi- 
tions being imposed on the estimated distribution. In addition, we develop 
the adaptive estimator for the supersmooth case and derive its statistical 
properties. 

2. We study the practical aspect of implementing the adaptive estimator 
through an extensive simulation study considering different error distri- 
butions and comparing it to the empirical distribution function and the 
SIMEX method. 

3. We apply the adaptive method to a real data example where one is in- 
terested in estimating hypertension prevalence in a population based on 
blood pressure measurements. 

The rest of this paper is organized as follows. In section 2 we describe the 
estimation method and present the relevant theory for the supersmooth case. 
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In section 3 we present the simulation study while in section 4 we apply our 
method to the real data example. A discussion follows in section 5 and proofs 
are provided in the appendix. 

2 The estimation method 
2.1 Deconvolution estimator 

The problem of estimating a distribution function in the presence of measure- 
ment errors is formulated mathematically as follows. Let X\,. . . ,X n be a se- 
quence of independent identically distributed random variables with common 
distribution F x ■ Suppose that we observe random variables Y\,...,Y n given by 

Y j = x j + e ^ 3 = l,...,n, (1) 

where Cj are independent identically distributed random variables, independent 
of Xj's with a known density / e w.r.t. the Lebesgue measure on the real line. 
Our objective is to estimate the cumulative distribution function Fx{xq) at any 
single given point xo G K from the observations Y\ , . . . , Y n . 

The deconvolution estimator presented in this paper is based on Fourier 
methods for which we introduce the following notation. Denote the character- 
istic function of a random variable X by <fix(u) := Ee luX , wfl, and let 
be the imaginary part of the complex variable z. Now, consider the inversion 
formula for a continuous distribution (sec Gurland (1948), Gil-Pelaez (1951) 
and Kendall, Stuart and Ord (1987, §4.3)) 

Fx{x ) = \-- [ -%{e- iux °<j> x (aj)}daj, x a eR. (2) 

The above integral is interpreted as an improper Riemann integral. Assuming 
that tfi e is known, we use the fact that 0x(^) = 0r(w)/</> e (w), and replace </>y(u;) 
by its empirical counterpart 0y (cj) := ^ X^j=i eluYj ■ This leads to the following 
estimator for Fx{xq): 

'Iflj.-— £MU, (3) 

where A > 0, is a predefined parameter (to be discussed later). 

This estimator is well defined if we assume that |0 e (u>)| ^ for all w € 1. 
This is a standard assumption in deconvolution problems; thus, throughout the 
paper we assume that the error characteristic function does not vanish. 

Remark 1. In practice, the error distribution may not be completely known 
and additional information may be needed (e.g. repeated observations on Yj for 
a given Xj). In that case, a parametric approach may be taken for which the 
error distribution takes an explicit form (see below) depending on an unknown 
parameter for which an appropriate estimate may be used (we take this path 
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when studying the real data example). A nonparametric approach would be to 
estimate 4> e and use it in the estimation procedure. We discuss this point in 
Section^ 

We now take a deeper look into the dcconvolution estimator ([3]). Generally, 
the estimator takes the form 

1 1 - 

F\(x a ) = - - -^2h(Yj,xo), 

U 3=1 

Note that I\(y,xo) depends on the measurement error distribution. For exam- 
ple, in the case of Laplace error with zero expectation and scale parameter 9 we 
have 



a sin 



7rw(y - x ) 



duj 



! sin[A(y - x Q )] 2 A cos[A(y - x )] 



ir(y - x ) 2 n(y - x ) 



while if the measurement error follows the normal distribution with standard 
deviation cr e , then 

a sin ui(y — xq) 



I\(y,xo) = - expl- 

7T ,/n UJ V 



du. 

We see that the form of the dcconvolution estimator is determined by the distri- 
bution of the measurement error. Lower bounds on rates of convergence show 
that the type of the error distribution is intrinsic to deconvolution problems. 
Indeed, it is well known that rates of convergence of the distribution/density 
function estimators in measurement error models are affected by the smoothness 
of the error density and the density to be estimated (see e.g. Dattner, Gold- 
enshluger and Juditsky (2011) and references therein). Smoothness is usually 
described by the tail behavior of the characteristic function, as in the following 
assumption for (f> e which characterizes supersmooth distributions. 

Assumption 1. There exist positive constants (3 > 0, 7 > 0, cq > and c\ > 
such that 

c exp(-7|w| /3 ) < \<f>(w)\ < ci exp(-7|w| /3 ), Vw£l. 

The normal (j3 = 2) and Cauchy ((3 — 1) densities are examples for which 
Assumption [1] holds. In particular, the tails of the characteristic function of 
the normal and Cauchy decay exponentially. This is in contrast to the ordinary 
smooth case where the tail of 4>e decays in polynomial order. The spaces of ordi- 
nary smooth functions correspond to classic Sobolev classes, while supersmooth 
functions are infinitely differentiable. 

We also impose the following assumption. 
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Assumption 2. There exist positive real numbers ujq, b e and r such that 



Assumption [2] describes the local behavior of the characteristic function of 
the error 4> e near the origin, and holds if 4> e is smooth at w = 0. Since for any 
non-degenerate distribution there exist positive constants b and S such that 
\(/>(uj)\ < 1 — b\uj\ 2 for all \w\ < S [see, e.g., Petrov (1995, Lemma 1.5)], therefore 
we have r € (0, 2]. 

We consider the Sobolev class of functions in order to express the smoothness 
of the estimated distribution Fx ■ 

Definition 1. Let a > —1/2, L > 0. We say that Fx belongs to the class 
S a {L) if it has a density fx with respect to the Lebesgue measure, and 
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(w)| 2 (l+w 2 ) Q du < I? 



The set S a (L) with a > —1/2 contains absolutely continuous distributions 
while if a > 1/2 then S a (L) contains distributions with bounded continuous 
densities. 

In our study of the rates of convergence of the deconvolution estimator we 
bound the maximal (pointwise root mean squared error) risk of the estimator 
over the nonparametric family S a (L) defined above. Rates of convergence of 
the estimator ([3]) for the case of ordinary smooth error and Fx G S a (L) were 
studied in Dattncr, Goldenshluger and Juditsky (2011). The following theorem 
establishes rates of convergence for the supersmooth case. 

Theorem 1. Let the observations be given by model (QJ). Let the estimator for 
Fx(%o) be F\(xq) as defined in (0) and associate with the parameter 



_r\nn ^c £ + ^ In (^) - 21n(tf L) 1/j8 



If a > —1/2 and Assumptions\T$^hold, then we have for all xq £ R and large 
enough n 



r 1/2 /l n n s -(a+l/2)/0 

sup \E\F x ^x )-F x (x )\ 2 } <K q l( — ) , (5) 



where Kq := -\/2/7r[l + (2a+l) 1 / 2 ] and c c depends only on the error distribution 
and is defined in 



Unlike the case of ordinary smooth errors the rate of convergence in the 
supersmooth case is very slow, logarithmic in the sample size n. We note that 
this rate of convergence is minimax optimal for a > 1/2. In order to prove such 
a result one needs to show that the maximal risk ([5]) matches up to a constant 
the minimal attainable risk for this problem. Indeed, under additional standard 
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assumptions on <f) e it can be shown that if a > 1/2 and the class S a (L) is rich 
enough, then without loss of generality we have for all n large enough 



where C is a positive constant independent of n and inf is taken over all possible 
estimators Fx (0) of Fx (0) . This lower bound on the minimax risk is in the same 
order as the upper bound given in Theorem [TJ Thus, the estimator ([3]) with 
the choice A = A* is optimal in order. That is to say that no other estimator 
can do better (in the minimax sense). This result can be proved in the same 
way Dattner, Goldenshluger and Juditsky (2011) derived the lower bound for 
the case of ordinary smooth errors. Under additional assumptions on the tail 
behavior of Fx, Fan (1991) derived minimax optimal rates of convergence for 
estimation over Holder classes. 

The optimal choice of the parameter A = A* as given in the theorem is a 
result of the standard bias- variance trade-off. The bias of the estimator depends 
only on the distribution of X and decreases as A increases. On the other hand, 
the variance is affected by the tail behavior of the error characteristic function 
(j) e and is increasing with A. It is clear that the role of the design parameter 
A is crucial. The problem is that in practice we do not know the value of 
the class parameters a, L and therefore A* as defined in the theorem can not 
be calculated. In the next section we show how to choose the "bandwidth" 
parameter A based only on the information we have, namely, the given data and 
the assumed error distribution. 

2.2 Adaptive deconvolution estimator 

We first develop an adaptive version of the estimator for the case of supersmooth 
error distribution and provide its theoretical properties. Then we discuss the 
ordinary smooth case were we mimic the optimal choice A = A* by an adaptive 
algorithm based on Lepski's adaptation procedure (Lepski 1990). The theo- 
retical properties of the resulting estimator in the ordinary smooth case were 
studied in Dattner, Goldenshluger and Juditsky (2011) who showed that the 
adaptive estimator is consistent and achieves the optimal rate of convergence 
within a logarithmic factor (it can be shown that the logarithmic factor cannot 
be eliminated, see Lepski (1990)). 

We now develop an adaptive version of the estimator for the case of super- 
smooth error. In particular, the next theorem shows that there is no additional 
payment for adaption in this case. 

Theorem 2. Let the observations be given by model {!]). Let the estimator for 
Fx{xq) be F\(xq) as defined in and associate with the parameter 



inf sup 

Fx F x £<Sc(Z/) 



[e\F x (0) - F x (0)\ 2 } >C[l 



n n 



(a+l/2)//3 




Inn 



lnc e + [ln(^ 



) 



} 



1//3 



2- 



2n 



G 



If a > —1/2 and Assumptions\T^hold, then we have for all xq £ R and large 
enough n 

( ~) 1/2 /l n „ \ -(a+l/2)/P 

sup {^(, )-F x (x )| 2 } <K l(^ 

where Kq := y/2/Tr[l + (2a+l)~ 1 / 2 ] and c e depends only on the error distribution 
and is defined in 



Note that the rate of convergence in the theorem is the optimal one when 
a > 1/2. Moreover, A does not depend on the class parameters a and L. In 
particular, A is smaller than A* (as defined in Theorem [1} which depends on 
a in a term of second order. Therefore, the small modification of A* which 
makes the bias dominant in the bias- variance trade off, does not affect the rate 
of convergence. 

We now turn to the case of ordinary smooth error distribution. Consider the 
set of positive parameters A := {A m ; n , . . . , A max }, and the family of estimators 
Fa := {F\(xq), A € A}, where F x (x ) is given by ([3]). Define 

rl J\ il/2 
^a:= -]TUaC^o)} 2 , (6) 

where I\ is given by ((4]). The adaptive estimator Fa{xq) is obtained by selecting 
from the family Fa according to the following rule. Let K c — 0.0275 + 0.3074a e , 
and with any estimator F\{xq) we associate the interval 



and define 
where 



F x (x )- K e {-^-j & x , F x (x ) + K,[^j a x \, 

F A (x ) := F k (x ), (7) 
A := min |a e A : f| $\. 



We use below the set A = 0.01(0.05)10 and the projection of Fa(xq) on the 
interval [0, 1] as the final estimator. 

The value of K e as specified above is a result of the tuning of the adaptive 
algorithm. Although according to the theory, for a given error distribution one 
can determine the constant K e , it turns out to be too conservative in practice. 
This problem was already noted by Spokoiny and Vial (2009) who proposed a 
tuning approach for a different model. 

A detailed explanation of our tuning approach is given in the appendix. We 
note that we "tuned" our algorithm according to the Laplace error. In the sequel 
we use this rule for all error distributions including the normal one (and not the 
adaptive estimator defined in Theorem [2] for the supersmooth case). Ideally, we 
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could calibrate our estimator specifically for a given error distribution. However, 
considering the long computational time of calibration and the fact that the 
performance of the adaptive estimator in simulations does not seem to be very 
sensitive to this assumption, we use this rule for all measurement error models 
in our simulation study. 

3 Simulation study 
3.1 Study description 

The following set up is used in our simulation study. The unobserved distribu- 
tion Fx is assumed to be one of the following. 

1. Gamma with shape parameters 3 and scale l/y/3. 

2. Standard normal. 

Define the standard deviations of X and e by ax and a e respectively. The er- 
ror distributions are chosen such that we have a specific noise to signal ratio 
a e /ax- In particular, we are interested in the values o f Jox = 0.2, 0.5, cor- 
responding to 20%, 50% error contamination respectively. We consider eight 
error distributions as follows. 

1. Gamma distribution with shape parameter two, and scale parameters 6 = 
l/(5\/2), 1/(2^2). 

2. As in (1) but relocated to have zero expectation. 

3. Laplace distribution with zero expectation and the same scale parameters 
as in (1). 

4. Normal distribution with zero expectation and standard deviations ct £ — 
1/5, 1/2. 

Two of the above (3. and 4.) provide error distributions which are symmetric 
around zero but differ in their tail properties. The other two arc skewed distri- 
butions with (1.) resulting in only positive values while (2.) allows for negative 
values as well. 

Usually, measurement errors are considered to have zero expectation but in 
some cases this appears not to hold. In the context of blood pressure Marshall 
(2004) discusses that the presence of a medical student results in an increase 
in measured blood pressure. Walker and Rollins (1997) in a robustness study 
of ANOVA consider a beta distribution with nonzero expectation as a possi- 
ble model for measurement errors. Albers, Kallenberg and Otten (1998) in 
the context of screening production processes discuss situations with nonzero 
expectation for measurement error. 

All together, we have sixteen combinations of measurement error models. 
Each combination is simulated for sample sizes n = 100, and 500, resulting in 
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thirty two different experimental set ups. For each experimental set up, 1000 
independent samples of size n were generated, from which we estimated for 
various values of xq, Fx(xq) where Xq values were chosen to correspond to the 
percentiles 0.1, 0.25, 0.5, 0.75, 0.9 of the unobserved distribution Fx- 

In all the scenarios just defined, the behavior of the adaptive estimator 
was compared to two other estimators. The first is the empirical distribution 
function of the observations which we call the naive estimator, 



1 " 

F Y (x ):=-J2m<xo), 



n 

where l(-) stands for the indicator function. The second is the SIMEX (simu- 
lation extrapolation) estimator Fs(xq) introduced in Stefanski and Bay (1996), 
which we describe now. 

In simulation extrapolation, estimators are recomputed on a large number 
B of measurement error-inflated, pseudo data sets, {Yj.bir)}™^, (b = 1, B), 
with 

Y jt b(T) = Yi + ^lb, (J = 1, n, b = 1, B), 

where e* b ~ f t are independent, pseudo-random variables and r > is a con- 
stant controlling the amount of added error. According to this setup the total 
measurement error variance in Yj^r) is <t~(t + 1). Thus, the general idea is 
based on the fact that if we let t = — 1 then we end up with zero measurement 
error in the random variables Yj^{r) . 

The cumulative distribution function estimator calculated from the 6th variance- 
inflated data set Yjb (r) is called the 6th pseudo estimator, and is 



1 - 

-J2HYj,b(r)<xo), (b = l,...,B). 



n . 

3=1 



We now average the pseudo estimators and define 

B i n 

' XQ) 



1 B 1 n 



6=1 j=l 

The SIMEX method is based on the assumption that the expectation E[FY, T ,n(xo)] 
can be well approximated by a quadratic function of t: /3q + Pit + P2T 2 , for con- 
stants /3o,/3i,/?2 depending on xq, <t£ and Fx- For a given sequence ri,...,r m , 
the SIMEX procedure require to estimate {-FV,Ti,n(^o), ...,FY, Tm ,n{xo)}, so that 
/3o, /?i , /?2 can be estimated by a least squares regression of {FY, Tl ,n(xo), -PV,r m ,n(^o)} 
on ri, ...,T m , yielding the estimates /?o, Pi, h- Extrapolation to the case of no 
measurement errors is accomplished by letting r — > — 1, resulting in the SIMEX 
estimator 

F s (x ) :=Po-Pi+02- 

In our simulations B = 2000 and following Stefanski and Bay (1996) we set 
t = 0.05(0.4875)2. 
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3.2 Numerical results 



Tables Q]|3] summarize the empirical root mean square error and bias of the 
three estimators described above for the different experimental set ups. We 
present only the results for sample size n = 500, since they are similar to 
those for n = 100, but are more stable. For each error distribution in the 
tables, the first block is for 20% contamination while the second block is for 50% 
contamination. The observed absolute value of the bias x 10 of the estimator is 
given in parentheses. 

In Tables Q] and [2] we see that when the error takes only positive values, 
i.e., is Gamma distributed, then the adaptive estimator achieves better results 
uniformly over the distribution of X for both 20% and 50% contamination. The 
bias of the SIMEX and naive estimators is very large in these cases. When 
the distribution of the error is Gamma around zero, then the performance of 
the SIMEX and naive estimators substantially improves. However, the adaptive 
estimator is usually better in root mean square error, and when not, its root 
mean square error value is close to the best. 

For Laplace distributed measurement error the results are similar for both X 
distributions. When the contamination is 20% the adaptive estimator is again 
uniformly better than the other two. However, the results are more mixed when 
we have 50% contamination. 

When the error is normally distributed, the results are mixed. Here, the root 
mean square error of the adaptive estimator is high when estimating lower and 
upper quantiles under 20% contamination, but has the same order as SIMEX 
for estimating other quantiles. Note that in terms of root mean square error, the 
naive estimator performs very well under normal error with small contamination. 

Remark 2. Recalling that for normal error the optimal minimax rates are 
very slow (logarithmic in the sample size), one may wonder how in practice the 
estimation results seems to be reasonable as implied by our simulation study. 
This may be a result of the essentially small error variance, see for example 
Fan (1992) who studied how large a noise level is acceptable under supersmooth 
error distributions. 

Summarizing the numerical results, we see that the adaptive estimator per- 
forms reasonably well regardless of the shape and location of the error distri- 
bution while the SIMEX and naive estimators do not. Indeed, when the error 
is Gamma distributed, there are cases where the empirical root mean square 
error of the adaptive estimator is about one tenth of the empirical root mean 
square error of the naive estimator. This phenomenon is illustrated in Figure [TJ 
We present there box plots for the case where X ~ N(0, 1) and e is Gamma 
distributed with shape parameter two and scale parameter l/(5\/2~) over the 
1000 Monte Carlo simulations based on a sample size of n = 500. In the figure 
we focus on the estimation of the cumulative probabilities 0.25 and 0.75. The 
box plots for the adaptive, SIMEX and naive estimator are displayed side by 
side. It is clear from the plots that the naive estimator is totally wrong for the 
asymmetric error distribution. The SIMEX is less affected and the adaptive 
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Table 1: Empirical root mean square error and bias x 10 (in parenthesis) for 
estimating standard normal under non symmetric error distribution. 



Fx{x ) 



Estimator 


0.1 


0.25 


0.5 


0.75 


0.9 






Gamma error - 


20% contamination 




Adaptive 

SIMEX 

Naive 


0.013 (0.031) 
0.020 (0.119) 
0.039 (0.373) 


0.020 (0.032) 
0.029 (0.158) 
0.078 (0.756) 


0.022 (0.004) 
0.031 (0.056) 
0.111 (1.093) 


0.019 (0.028) 
0.032 (0.149) 
0.102 (1.002) 


0.013 (0.014) 
0.032 (0.229) 
0.065 (0.630) 






Gamma error - 


50% contamination 




Adaptive 

SIMEX 

Naive 


0.019 (0.042) 
0.048 (0.458) 
0.066 (0.656) 


0.026 (0.051) 
0.087 (0.833) 
0.145 (1.442) 


0.027 (0.019) 
0.097 (0.908) 
0.237 (2.361) 


0.028 (0.037) 
0.040 (0.146) 
0.254 (2.528) 


0.024 (0.044) 
0.073 (0.676) 
0.198 (1.966) 




Gamma error with zero expectation - 20% 


contamination 




Adaptive 

SIMEX 

Naive 


0.013 (0.020) 
0.016 (0.005) 
0.014 (0.039) 


0.019 (0.031) 
0.023 (0.008) 
0.020 (0.051) 


0.021 (0.003) 
0.027 (0.001) 
0.023 (0.003) 


0.019 (0.028) 
0.022 (0.004) 
0.020 (0.042) 


0.014 (0.024) 
0.016 (0.001) 
0.014 (0.046) 




Gamma error with zero expectation - 50% 


contamination 




Adaptive 

SIMEX 

Naive 


0.018 (0.035) 
0.020 (0.007) 
0.027 (0.232) 


0.026 (0.050) 
0.027 (0.035) 
0.033 (0.263) 


0.028 (0.001) 
0.031 (0.045) 
0.024 (0.077) 


0.030 (0.056) 
0.027 (0.023) 
0.026 (0.175) 


0.024 (0.044) 
0.021 (0.001) 
0.030 (0.256) 
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Table 2: Empirical root mean square error and bias x 10 (in parenthesis) for 
estimating Gamma with shape three and scale 1 / -\/3 under non symmetric error 
distribution. 



Fxjxo) 



Estimator 


0.1 


0.25 


0.5 


0.75 


0.9 






Gamma error - 


20% contamination 




Adaptive 

SIMEX 

Naive 


0.014 (0.041) 
0.045 (0.420) 
0.065 (0.642) 


0.018 (0.003) 
0.041 (0.321) 
0.112 (1.113) 


0.023 (0.021) 
0.034 (0.084) 
0.128 (1.264) 


0.019 (0.001) 
0.037 (0.234) 
0.092 (0.893) 


0.014 (0.013) 
0.026 (0.154) 
0.046 (0.434) 






Gamma error - 


50% contamination 




Adaptive 

SIMEX 

Naive 


0.021 (0.056) 
0.087 (0.871) 
0.088 (0.883) 


0.027 (0.032) 
0.161 (1.600) 
0.190 (1.896) 


0.032 (0.057) 
0.137 (1.332) 
0.281 (2.801) 


0.029 (0.036) 
0.048 (0.296) 
0.252 (2.512) 


0.021 (0.014) 
0.093 (0.917) 
0.150 (1.482) 




Gamma error with zero expectation - 20% 


contamination 




Adaptive 

SIMEX 

Naive 


0.014 (0.041) 
0.019 (0.007) 
0.018 (0.108) 


0.018 (0.003) 
0.025 (0.006) 
0.020 (0.047) 


0.023 (0.021) 
0.027 (0.008) 
0.023 (0.027) 


0.019 (0.001) 
0.022 (0.003) 
0.020 (0.049) 


0.014 (0.013) 
0.016 (0.006) 
0.014 (0.032) 




Gamma error with zero expectation - 50% 


contamination 




Adaptive 

SIMEX 

Naive 


0.021 (0.051) 
0.025 (0.094) 
0.053 (0.509) 


0.026 (0.030) 
0.030 (0.073) 
0.037 (0.309) 


0.033 (0.059) 
0.031 (0.021) 
0.024 (0.054) 


0.030 (0.030) 
0.026 (0.002) 
0.031 (0.247) 


0.021 (0.013) 
0.019 (0.001) 
0.024 (0.192) 
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Table 3: Empirical root mean square error and bias x 10 (in parenthesis) for 
estimating standard normal under symmetric error distribution. 



Fx{x ) 



Estimator 


0.1 




0.25 


0.5 


0.75 


0.9 












Laplace error - 


20% contamination 










Adaptive 

SIMEX 

Naive 


0.013 
0.016 
0.014 


(0. 
(0. 
(0. 


.027) 
.007) 
.051) 


0.019 (0.012) 
0.023 (0.013) 
0.020 (0.029) 


0.021 (0.002) 
0.026 (0.001) 
0.023 (0.001) 


0.019 
0.022 
0.019 


(0.027) 
(0.003) 
(0.043) 


0.013 
0.016 
0.014 


(0. 
(0. 
(0. 


.010) 
.014) 
.032) 










Laplace error - 


50% contamination 










Adaptive 

SIMEX 

Naive 


0.022 
0.019 
0.029 


(0. 
(0. 
(0. 


.055) 
.005) 
.253) 


0.027 (0.047) 
0.025 (0.003) 
0.028 (0.210) 


0.029 (0.003) 
0.029 (0.002) 
0.023 (0.004) 


0.029 
0.026 
0.029 


(0.044) 
(0.001) 
(0.211) 


0.022 
0.020 
0.029 


(0. 
(0. 
(0. 


.044) 
.009) 
.243) 










Normal error - 


20% contamination 










Adaptive 

SIMEX 

Naive 


0.032 
0.016 
0.015 


(0. 
(0. 

(0. 


.286) 
.005) 
.051) 


0.022 (0.128) 
0.023 (0.002) 
0.020 (0.045) 


0.019 (0.005) 
0.025 (0.005) 
0.021 (0.004) 


0.023 
0.024 
0.021 


(0.138) 
(0.005) 
(0.040) 


0.032 
0.016 
0.014 


(0. 
(0. 

(0. 


.290) 
,000) 
.042) 










Normal error - 


50% contamination 










Adaptive 

SIMEX 

Naive 


0.025 
0.020 
0.030 


(0. 
(0. 
(0. 


.186) 
.012) 
.260) 


0.029 (0.198) 
0.027 (0.008) 
0.030 (0.225) 


0.019 (0.003) 
0.031 (0.001) 
0.023 (0.001) 


0.030 
0.027 
0.031 


(0.210) 
(0.028) 
(0.237) 


0.024 
0.020 
0.030 


(0. 
(0. 
(0. 


.180) 
,014) 
.262) 
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Table 4: Empirical root mean square error and bias x 10 (in parenthesis) for 
estimating Gamma with shape three and scale under symmetric error 

distribution. 



Fx(x ) 


Estimator 


0.1 


0.25 


0.5 


0.75 


0.9 











Laplace error - 


20% contamination 












Adaptive 


0.014 


(0 


.028) 


0.019 (0.004) 


0.021 (0.020) 0.019 


(0 


.018) 


0.014 


(0 


.017) 


SIMEX 


0.018 


(0 


.014) 


0.024 (0.003) 


0.025 (0.006) 0.021 


(0 


.016) 


0.015 


(0 


.001) 


Naive 


0.017 


(0 


.085) 


0.020 (0.028) 


0.022 (0.032) 0.019 


(0 


.032) 


0.014 


(0 


.024) 










Laplace error - 


50% contamination 












Adaptive 


0.026 


(0 


.056) 


0.029 (0.022) 


0.033 (0.055) 0.027 


(0 


.011) 


0.019 


(o 


.010) 


SIMEX 


0.022 


(0 


.022) 


0.027 (0.024) 


0.030 (0.024) 0.026 


(0 


.029) 


0.018 


(0 


.009) 


Naive 


0.045 


(0 


.423) 


0.027 (0.177) 


0.027 (0.141) 0.031 


(0 


.232) 


0.022 


(0 


.168) 










Normal error - 


20% contamination 












Adaptive 


0.029 


(0 


.257) 


0.023 (0.137) 


0.021 (0.003) 0.022 


(0 


.110) 


0.030 


(o 


.273) 


SIMEX 


0.019 


(0 


.003) 


0.023 (0.000) 


0.027 (0.003) 0.023 


(0 


.010) 


0.015 


(0 


.005) 


Naive 


0.017 


(0 


.099) 


0.019 (0.029) 


0.023 (0.039) 0.020 


(0 


.038) 


0.014 


(0 


.023) 










Normal error - 


50% contamination 












Adaptive 


0.040 


(0 


.357) 


0.027 (0.168) 


0.030 (0.197) 0.029 


(0 


.198) 


0.016 


(o 


.060) 


SIMEX 


0.026 


(0 


.109) 


0.028 (0.002) 


0.031 (0.073) 0.027 


(0 


.014) 


0.019 


(0 


.001) 


Naive 


0.052 


(0 


.493) 


0.029 (0.206) 


0.028 (0.180) 0.034 


(0 


.273) 


0.022 


(0 


.170) 
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Adaptive SIMEX Naive Adaptive SIMEX Naive 



Figure 1: The effect of the shape of the error distribution on the performance of 
the estimators. Here X ~ N(0, 1), e is Gamma distributed with shape parameter 
two and scale parameter l/(5\/2), MC=1000 and n = 500. 



estimator achieves the best result. When the measurement error distribution is 
symmetric, the results are mixed with no method being superior all the time. 
However, we note that for larger sample sizes, we expect the naive estimator to 
be worse than the adaptive estimator since the naive estimator is not consistent. 

MATLAB code for executing all simulations described above and implement- 



ing the adaptive estimator for user data is available at http : //stat . haif a. ac . il/~idattner/add 



4 Estimating hypertension prevalence 
4.1 Data description 

High blood pressure (hypertension) is a direct cause of serious cardiovascular 
disease (Kannel (1995)) and estimating hypertension prevalence is of substan- 
tial interest. Specifically, a blood pressure level of 140/90 mmHg or greater is 
considered high. However, blood pressure is known to be measured with addi- 
tional error which needs to be addressed in its analysis (see e.g., Marshall (2004) 
and references therein). Thus, treating the observed blood pressure measure- 
ments naively and estimating hypertension prevalence with, say, the empirical 
distribution function, would result in a biased estimate. 

We illustrate our method using data from the Framingham Heart Study 
(Carroll, Ruppert, and Stefanski (2006)). This study consists of a series of exams 
taken two years apart. We use systolic blood pressure (SBP) measurements of 
1,615 men aged 31 — 65, from Exam two and Exam three. We treat the SBP 
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240 260 



Figure 2: Systolic blood pressure measurements of 1, 615 men aged 31 — 65 from 
the Framingham Heart Study. 

values of each individual j for the two exams (Yj t ±, Yj^) as repeated measures 
of the long-term average SBP, which is denoted by Xj : 

Yil = X j + e jtl , (8) 

Yj,2 = Xj + ( 

for individuals j = 1, n. 

Following Carroll, Ruppert, and Stefanski (2006), we use the average of the 
two exams Yj = (Yj t i + Yj^/2, so that the model in our case is 

y; = x, + e ;, (9) 

where = (cj l x + ej t 2)/2 > and we are interesting in the estimation of 1 — Fx (140) 
from the data Yj, j = 1, 1615. An histogram of the data Y 1 is displayed in 
Figure [H 

Note that the repeated measures model ([8]) represents a balanced random 
effects model, thus the measurement error variance estimate (Searle (1992)) is 

i=i k=i y ^ ' 

where Yj, := ^ X)fc=i Yj,k is the sample mean for each individual j. In our case 
n = 1, 615, p = 2 and the measurement error variance estimate is a\ — 84.755. 
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An important aspect in the model described here that we did not consider 
in our simulation study of Section [3] is that a e is not known but estimated from 
the data. In order to understand how this practical feature affects our method, 
we performed another simulation study, based on the model as defined in |5])- 
©, in which we assume that e - iV(0,9.206 2 ) and X ~ ^(130.757, 17.528 2 ). 
In particular, the simulation step of the SIMEX estimator is based on cr 2 as 
given by (|T0|) and our method is based on a standardized version of ([9]), i.e., 
(Yj — i Y^j=i Yj)l<JY' and the estimated variance a e /ay (the standardization 
is needed because of the way we tuned the adaptive algorithm; see the appendix 
for a detailed explanation). 

We note that the X parameters are not arbitrary. Under the assumption 
that the errors have zero mean, fix = 130.757 is just the observed sample mean, 
and &x — 17.528 is 



-{ — °*}> 



where Y = i Y^=i ^i - Table [5] presents the results of 1000 simulations which 
were carried out with a sample size of n = 500 and contamination of about 50% 
(9.206/17.528). These can be compared to the results for estimating N(0, 1) 
under W(0,0.5 2 ) error in Tabled 

Table 5: Empirical RMSE and bias x 10 (in parentheses) for estimating 
A^(130.757, 17.528 2 ) under iV(0,9.206 2 ) error. 



Fx (s ) 


Estimator 


0.1 


0.25 


0.5 


0.75 


0.9 


Adaptive 

SIMEX 

Naive 


0.017 (0.088) 
0.019 (0.000) 
0.021 (0.148) 


0.022 (0.117) 
0.026 (0.005) 
0.024 (0.131) 


0.017 (0.007) 
0.029 (0.003) 
0.022 (0.002) 


0.022 (0.116) 
0.025 (0.003) 
0.024 (0.132) 


0.017 (0.080) 
0.019 (0.005) 
0.021 (0.153) 



We see that for the specific parametric set up here, the adaptive estimator 
is uniformly better than the SIMEX and naive estimators in terms of root mean 
square error. The large ax in this case indicates the smoothness of the X 
distribution. If we consider theoretical aspects of these methods, then the good 
theoretical properties of the adaptive estimator described above, guarantee that 
in the minimax sense, no other estimator can do better over the class of finite 
smoothness distributions. 

4.2 Statistical inference 

When estimating a disease prevalence, an applied statistician may not be sat- 
isfied with only pointwise properties of a new method, no matter how good 
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they are. Thus, the next natural step would be to discuss the accuracy of the 
adaptive estimator and provide interval estimation. However, it is a known fact 
that confidence bands cannot adapt to the smoothness of the unknown function 
Fx (see Low (1997)). One possibility would be to use bootstrap confidence 
intervals but in our case they require heavy computational efforts with no un- 
derlying theory to justify them. For practical implementation we suggest using 
the following approach. 

Let t = ^/ Fy ( g ° ) ( 1 ~ Fy ( a ° ) T anc { consider the following asymptotically based 

1 — a confidence interval for F y (xq), 

l-a= p{f y {x ) - Zl _ a/2 T < F Y {x ) < F Y (x ) + zi_ q/2 t}, (11) 

where Zi- a /2 is the 1 — a/2 quantile of the normal distribution and F y (xq) is 
the empirical distribution function. Now let us look at the right hand side of 
the interval in (fTTj) and note that 

p{Fy(x q ) < F Y (x ) + Zi_ a/2 T)} 

= p{f x (x ) < F Y (x ) + F x (x ) - F Y (x ) + *l-a/2T)} 
< p{f x (xo) < F Y (x Q ) + \F x (x ) - F Y (x )\ + «i_ a/2 r)}. 

Applying the same argument to the left hand side of the interval in (fTTj) wc 
obtain 

p{Fx(x ) G {F y (x ) ± [\Fx(x ) - F Y (x )\ + Zl _ Q r] }} > 1 - a. (12) 

Note that when there is no measurement error Fx(xo) = F y (xq) and the interval 
(fT2"|) reduces to that in (jTTJ) . If the error is moderate, then we expect that the 
interval (| L2[) would be somewhat conservative but still reasonable. However, 
this interval is based on unknown quantities and can not be practically applied. 
Therefore, we use its empirical counterpart by plugging in the estimators for r 
and Fx(xo) as follows: 

CI[F x (x )} := {F Y (x ) ± [\F A (x ) - F Y (x )\ + z^ a f] }, (13) 

where Fa(xq) stands for the adaptive estimator, _FV(a;o) for the empirical dis- 
tribution function and 




F Y {x )(l - F Y (x )) 
n 



Simulation results presented in Table [S] indicate that the observed coverage 
of this interval for a = 0.05 was close to the nominal 95% level. 
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Table 6: Empirical coverage intervals and probabilities for estimating 
iV(130.757, 17.528 2 ) under iV(0,9.206 2 ) error based on 1000 samples of size 
n = 500. Here a = 0.05. The intervals and widths are averages over the 
1000 samples. 



F x {x ) 




0.1 


0.25 


0.5 


0.75 


0.9 


Interval 


[0.08,0.15] 


[0.22,0.30] 


[0.45,0.55] 


[0.70,0.78] 


[0.85,0.92] 


Width 


0.07 


0.09 


0.1 


0.09 


0.07 


Coverage 


93.6% 


94.1% 


98.5% 


94.1% 


93.5% 



4.3 Estimation in the data example 

We now turn to estimation of the hypertension prevalence. Here we assume that 
the measurement error is normally distributed, but unlike the above simulation 
study, no distributional assumption is made about X. 

The naive estimator in our case is 1 - Fy(140) = 0.225 while the SIMEX 
estimator is 1 - F s (140) = 0.184. The adaptive estimator is 1 - Fa(140) = 0.21 
and the interval given by (|13|) is [0.19, 0.26] (which does not include the SIMEX 
estimator). 

The fact that both the naive and the adaptive estimator yield similar esti- 
mation results may give the wrong impression that these methods behave the 
same. One then may prefer to use the naive estimator since it is more straight- 
forward to implement. However, although in the example above the results 
are similar, in other examples they may differ substantially. This depends on 
the estimated distribution which of course is not known to us. This is well 
illustrated by Figure [3] where we see one realization of estimating the normal 
mixture iV(0.15827, 1) + iV(l, 0.1225 2 ) under Laplace error (with scale l/(2\/2)) 
for n = 500. The adaptive estimator adapts to the underlying smoothness of the 
unknown normal mixture all over its quantilcs. However, the naive estimator 
behaves nicely in places where the underlying distribution is smooth but worse 
when it is not. Thus, the adaptive methods guarantee that in general we do 
better although in particular cases we may not. 

4.4 Sensitivity Analysis. 

In our example we used an estimate for the measurement error variance and not 
the unknown true value. In this case a sensitivity analysis of our results to dif- 
ferent values of the error variance would be informative. Under the assumption 
that both the estimated distribution and the error distribution are normally 
distributed, Searle (1992) provide an unbiased estimate for the variance of of 
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Figure 3: One realization of estimating normal mixture 7V(0. 15827, 1) + 
N(l, 0.1225 2 ) under Laplace error with scale l/(2\/2). Sample size n = 500. 
The solid line, dashed line, and dotted line correspond to the true distribution, 
adaptive and naive estimators respectively. 



which is 

var(cr f ) = — ; r -. 

v e ' n(p-l) + 2 

Under the assumption that the error is normally distributed, we calculated 
the adaptive estimator for a set of ten (equal spaced) values of a e ranging 
from of — 2-\/var(af ) to of + 2-^/varpf). Specihcally, in our case we have 
y / var((5 p j ) = 2.981 and the different estimates are given in Table [7] We see 
that the adaptive estimator stays very close to its initial value of 0.21 and is 
smaller than the naive estimate in all cases. The interval's upper and lower 
values (and width) show very little change. Thus, the adaptive estimator seems 
in our example to be robust to the fact that we estimate the measurement error 
variance. 



5 Discussion 

The problem of pointwise estimation of a distribution function in measurement 
error models was studied. Our estimation method was based on a direct in- 
version formula for the distribution function. This method was shown to be 
minimax optimal for ordinary smooth error distributions in Dattner, Goldensh- 
luger and Juditsky (2011). We have shown here that it is also minimax optimal 
for supersmooth error distributions and provided an adaptive version for this 
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Table 7: Sensitivity analysis for the adaptive estimator. 



~al Estimator Interval 



78 
80 
81 
82 
84 
85 
86 
88 
89 
90 



793 
118 
443 
767 
092 
417 
742 
067 
391 
716 
















209 
209 
209 
210 
210 
210 
211 
204 
205 
205 



[0 

[o 



[o 
[o 
[o 
[o 
[o 
[o 











19,0 
19,0 
19,0 
19,0 
19,0 
19,0 
19,0 
18,0 
18,0 
18,0 



26] 
26] 
26] 
26] 
26] 



27] 
27] 
27] 



26 



26 



case. In particular, we have shown that there is no payment in the rate of 
convergence when adapting under supersmooth error distribution. 

An extensive simulation study was carried out in order to study finite sample 
properties of the aformentioncd method. The adaptive estimator performs well 
in different estimation setups and seems to be the only reasonable estimator 
when the error distribution is not symmetric with non-zero expectation. 

The application of our method to a real data example was examined and dif- 
ferent practical aspects were explored. In particular, the data we considered are 
based on repeated measures and the estimation of the error variance was taken 
into account by modifying our estimation procedure to allow for the estimation 
of this parameter. The theoretical consequences of doing so are not yet known 
but simulation results are promising and in our particular example the adaptive 
estimator seems to be robust. The use of different assumptions for the error 
distribution can results in different estimates. In our data example we assumed 
that the measurement error is normally distributed. If the underlying error dis- 
tribution is Laplace then the adaptive estimator is 1 — ^(140) = 0.189 while 
if the error distribution is Gamma with shape parameter two and relocated to 
have zero expectation, then the adaptive estimator is 1 — Fa (140) = 0.178. 

This emphasizes the importance of developing methods without assuming a 
distributional form for the error. This estimation problem has been thoroughly 
studied for density deconvolution (sec Johannes (2009) and references therein) 
and similar paths may be taken for the distribution case. For instance, assuming 
that we have at hand an additional sample of directly observed measurement 
errors we can estimate the characteristic function (f> e by its empirical version. 
In general, this approach may lead to instable results and it is preferable to 
use a modified estimator in which only "good" estimates of <p f _ are taken into 
account. This method was shown to be minimax optimal for density deconvolu- 
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tion in Neumann (1997) and we are able to show similar theoretical results for 
distribution deconvolution. However, as already mentioned, this is not enough 
for practical considerations and an adaptive version of the estimator is required. 
The study of this problem is beyond the scope of this paper and will be consid- 
ered elsewhere. 
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Appendix 

5.1 Proof of Theorem 1 

The proof is based on the standard bias- variance decomposition 

E\F x (x )-F x {x )\ 2 = \EF x {x )-F x (x )\ 2 +E\F x (x )-EF x (x )\ 2 
=: B 2 (F x ;x q )+v&y{F x (x )}. 



5.1.1 Bounding the bias 

Note that 

1 1 



2 IT 

Therefore it follows from that 



B x (F x ;x ) 



1 



w- 1 9f(e- fa " BO 0x(w))dw 



< 



1 



lj 1 \4> X (ui)\duj. 



For a > using the Cauchy-Schwarz inequality we obtain 



B x (F x ;x ) < - I 



-did 



< 



1 



\0 X (Lo)\ 2 (l+U 2 ) a dLU 



1/2 



1 



w 2a+2 



duj 



1/2 



< \ — L 



2 , A-"- 1 ^ 



7r y/2a + 1 



If a e (-1/2, 0) then for any A > 1 



du y /2 < JlL[l+(2a+l)- 1 / 2 ]X- a ~ 1 / 2 . 
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Combining the two bounds we obtain the following bound for bias of the esti- 
mator, 

sup B x (F x ;x ) < K a L\- a -^ 2 , K := y/2fr[l + (2a + l)" 1 / 2 ]. (14) 

5.1.2 Bounding the variance 

The following lemma will be used in the sequel. 
Lemma 1. For any u,/i£t and xq € K one has 



oo r iu>(y-x ) 



<f> e (u) J [ 4> e (fJ.) 

Proof : Using (|T7|) we have 



p ,n(y-x ) 
3<! — r^- !>/y(y)*/ 



< 



\(j) Y (u - n)\ + |<M^ + /J,)\ 

2|^H||^(A*)| 



o iai(y-x ) 



oo />oo 



£ (w) J " |_ <£ 6 (/i) 

sin{cj(y - x - u)}sin{^(y - x Q - v)} f e (u) f e (v) dudv 



oo */ — OO 

OO /"OO 



— oq •/ — OO 

OO /» oo 



= I» 
2 



oo «/ — oo 
oo />oo 



cos{(w — [i)(y — xq) — uju + fiv} f e (u) f e (v) du dv 

cos{(w + fl)(y — Xq) — UJU — fiv} f e (u) f e (v) du dv 

f e (u)f e (v) dudv. 



oo J — oo 



Multiplying the last expression by /y (y), integrating over y and using the Fubini 
theorem we obtain 



o iui{y-x ) 



fy(y)dy 



OO /'OO 



oo — OO 



0eM J I </>e(» 

cj) Y (uj- iM)e- t{u} -^ Xo e lllv 

- ^>y(oj + n)e-^ +t " )xo e-^ v ]Mu)f e (v)dudv 

= isR {^H [<^)</>y (w - /i)e- 1 ^-^^ - <j> Y (w + ^e^^ ] } 

The result of the lemma immediately follows from the last relation. | 

By definition of Fx we can bound the variance of the estimator by the second 
moment as follows: 



var{F A } < -E(- 

n \ ir ,/ oj 



-$N — ; = Uoj 
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Let 



w a := min{o; , (26,)- 1 /-}, (15) 
where ujq and 6 e are defined in Assumption [2] Then we can write 



var{F A } < -^E 



Co' 



--: -?-(J 1 + 7 2 ), 
7Tn 



(16) 



and we bound ii and ^2 separately. 

1°. We begin with bounding I\. First note that 

9f{&- 1 (w)e* J(, '~ xo) } = \(t> e {uj)\- 2 ^^ {v - Xo) M^)} 



\M<»)\' 



sin{u>(y - x - u)}f e (u)du. (17) 



Therefore 
h =E 



^-00 



sin{cj(j/ - xp - h)} 



/ £ (u) G?it 



[A(l/)] 2 /V(i/)<fo. 



First, observe that |0 e (u;)|- 2 = 1 + r(u>), where r(w) := E^i(l0 e M| 2 - 1 ) fe - 
In addition, by Assumption [51 |r(w)| < E^Li(2^c| w | r ) fc f° r all M < w o- Hence 



Jl sin{u)(y - x - u)} 
|^(w)| 2 cj 



duj 



< 


1/ 




1 Jo 



sin{w(y - x - u)} 



duj 



,T\k 



+ 



|r( W )| 



do; 



< 2 + f;^<2 + r- 



fe=i 



where we have used the fact that sup a , >0 / t 1 s'mtdt < 1.85195 (see Kawata 



(1972)), the above upper bound on \r{ui)\ and the definition of uj\ in (TT5|) 
Therefore, by Fubini's and dominated convergence theorems , we get |/i(y)| _< 
(2 + t _1 ) for all y, which, in turn, implies that 



h < [2 + (l/r)] 2 . 



(18) 



2°. Now we bound Io- We have 



h = 



A r X 



1 



00 p iuj(y-x ) 



/wi W M 
Lemma [1] implies that 



I2 < 



! J 0)1 2wmI0£(^)I 



(/> e (w) 



duidfi - 



dujdfi. 



A ,-A 



by (U) + ft) I 



1 y Wl 2w/i|^ £ (w)| |<£ 6 (m)I 



dcodfi 
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Using the Cauchy-Schwarz inequality we have 



U If f x f x \Mu ~ J , \ 1/2 f f X f X !0y(^-/j)L . V /a 



(19) 

Because (<*>) — <j)x{^)4>e{^) and < 1 we have for any lo £ [wi, A] 

/A p A 

|0 e (w)| < ci / e" 71 ^ dw, 

-A V-A 

where we have used the upper bound in Assumption [TJ Substituting t = •yuj^ 
we see that 

•(.-,)l«^(^-V'»^<«fl, (20, 

where T(z) is the gamma function T(z) = J °° e~H z ~ x dt. Now, using the lower 
bound in Assumption Q] we obtain 



The last bound together with (|20|) and (|T9| leads to 

(1) 2 Cl r(i//3) 2 ^ 

which holds also for y . Therefore we conclude that 

3°. We now combine the bounds for Ii given in (I18p and the bound for J 2 
above together with p^|) to get 

var{F A (x )} < -j-{[2 + (1/r)] 2 + ^Tlft Ae 2 ^}. (21) 
5.1.3 Finding the optimal bandwidth 

Recall the definition for ui\ given in (fT5|) . let T(z) be the gamma function T(z) = 
J °° e~*i z_1 ctt and define 

-^{1™* + ^}- < 22 > 

The bound in (l2Tj) implies that for A > 1 we have var{i^\(xo)} < c t \e 2lXli n~ 1 . 
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We now wish to balance the squared bias with the variance by solving for A 
the equation 

CeAe^V 1 =KlL 2 \- 2a -\ (23) 



where the constant Kq is given in (|14l) . That yields 

_ rlnn lnc e + (2a + 2) InA* - 2 bx(K L) 
A 2^ J ' 

Now note that for large enough n we have A* < \lnn/(2 / y)\ i ^ , thus 
. rlnn hLC £ +^ln(^)-2HKoL) 

A 2^ J 



as given in the theorem. Indeed, plugging A* in (|23|) and noting that for large 
enough n 

/lnn\ 1 /P , /Tnn\V0 

(17) - A *-b7) ' 

the theorem follows. I 



5.2 Proof of Theorem 2 

The idea is to choose A smaller then the optimal A* so that it will make the bias 
dominant. To this end, note that for large enough n 

2a + 2 , An 
— < In f 



/mn\ 
\~2rf~) 



which implies that 



{inn « ; L ^(W)] ^ < flnn ^ + ggg In (jg) - 21n(jf L) ^ 
I 2t 2-7 J " I 27 2-7 J 



27 27 J ~ L 27 27 

Therefore e 27 ^ < e 2jX * . Finally, here also for large enough n we have A > 
[(In n)/{4rfj\ thus, plugging back in (j2"3")l these bounds for A the theorem 

follows. I 



5.3 Tuning of the adaptive algorithm 

Here we describe in detail the tuning of the adaptive algorithm. As already 
mentioned above, theoretically, K e depends only on the error distribution which 
is assumed to be completely known, and its exact value can be computed for any 
error distribution explicitly (see Dattner, Goldenshluger and Juditsky (2011)). 
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However, numerical experience suggests that the theoretical value of K e is too 
conservative. Thus, in practice we calibrated the adaptive algorithm as follows. 

We set X to be standard normal, e to be Laplace with standard deviation 
cr e , xq is the value for which Fx(xq) = 0.25, and the sample size n — 2000. The 
standard deviation of the measurement error takes the values cr e = 0.05(0.1)0.95. 
Let a\ be defined as in ©. For each a t we estimated Fx{xo) using the interval 

r A , . r ln(n)i V2 - , s r h(n)i V2„ n 
F A (o;o)-c e [^-^| a x , F x (x ) + c € {-^ j a\ 

for a set of different values of c e = 0.01(0.02)10. This procedure is repeated a 
hundred times and the value c e which minimized the empirical root mean square 
error of the adaptive estimator is chosen, and denoted by c 0l . This calculation 
was repeated fifty times which resulted in the fifty values c CTe) i, c CTe) 5o. The 
mean of these values was taken and is denoted by c ae . This results in ten values 
of c CTe corresponding to the ten values of a e . Then a simple regression with the 
values of er c as the independent variables, and those of c< 7l as the dependent 
variable results in the rule K t := c CTe = 0.0275 + 0.3074cr e . 

We note that the choice of X to be standard normal and Fx(xq) = 0.25 
in our calibration is arbitrary, at least theoretically. As mentioned above, the 
theoretical value of K e depends only on the error distribution. Indeed, calibra- 
tion with different choices for the distribution of X and the value of xq yielded 
similar results for a given error distribution. 

We further note that our study of the practical choice of K e is based on values 
of er e smaller than one. If cr e is larger than one, we standardize the observed 
sample so that it will have zero mean and standard error of one. Then we use 
a standardized form of a e in our procedure, i.e., the estimate a e /ay, where by 
is estimated from the observations. 
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